A Sparse Plus Low Rank Maximum Entropy Language Model

نویسندگان

  • Brian Hutchinson
  • Mari Ostendorf
  • Maryam Fazel
چکیده

This work introduces a new maximum entropy language model that decomposes the model parameters into a low rank component that learns regularities in the training data and a sparse component that learns exceptions (e.g. multiword expressions). The low rank component corresponds to a continuous-space language model. This model generalizes the standard `1regularized maximum entropy model, and has an efficient accelerated first-order training algorithm. In conversational speech language modeling experiments, we see perplexity reductions of 2-5%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Broad and Narrow Financial Risk Factors with Convex Optimization

Factor analysis of security returns aims to decompose a return covariance matrix into systematic and specific risk components. To date, most commercially successful factor analysis has been based on fundamental models, although there is a large academic literature on statistical models. While successful in many respects, traditional statistical approaches like principal component analysis and m...

متن کامل

Phonetic Variation Analysis Via Multi-Factor Sparse Plus Low Rank Language Model

Phonetic transcriptions contain rich information about language. First, the sequential patterns in phonetic transcripts reveal information about the language’s phonotactics. When combined with lexical information, this can help to grow or correct pronunciation dictionaries and to improve grapheme-to-phoneme prediction. Second, the places where pronunciations deviate from the norm can be equally...

متن کامل

Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations

We study the estimation of the latent variable Gaussian graphical model (LVGGM), where the precision matrix is the superposition of a sparse matrix and a low-rank matrix. In order to speed up the estimation of the sparse plus low-rank components, we propose a sparsity constrained maximum likelihood estimator based on matrix factorization, and an efficient alternating gradient descent algorithm ...

متن کامل

Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimization

We study the estimation of the latent variable Gaussian graphical model (LVGGM), where the precision matrix is the superposition of a sparse matrix and a low-rank matrix. In order to speed up the estimation of the sparse plus low-rank components, we propose a sparsity constrained maximum likelihood estimator based on matrix factorization, and an efficient alternating gradient descent algorithm ...

متن کامل

Tree Mapping Template for Prosodic Phrase Bound-ary Predication

This paper presents a novel method driven by tree mapping template (TMT) which improve the accuracy of prosodic phrase boundary prediction. The TMT is capable of capturing the isomorphic relation between non-terminal nodes in hierarchical prosodic tree and nodes in binary tree approximation, performing pruning at the decoding phase and revising the baseline maximum entropy model with boosting m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012